Serveur d'exploration sur la TEI

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

The Scottish Corpus of Texts and Speech: Problems of Corpus Design

Identifieur interne : 000234 ( Main/Exploration ); précédent : 000233; suivant : 000235

The Scottish Corpus of Texts and Speech: Problems of Corpus Design

Auteurs : Fiona M. Douglas [Royaume-Uni]

Source :

RBID : ISTEX:8A136A881E2F17F4A79E29FDB1EFC0B4490DF24E

Abstract

In recent years, the use of large corpora has revolutionized the way we study language. There are now numerous well‐established corpus projects, which have set the standard for future corpus‐based research. As more and more corpora are developed and technology continues to offer greater and greater scope, the emphasis has shifted from corpus size to establishing norms of good practice. There is also an increasingly critical appreciation of the crucial role played by corpus design. Corpus design can, however, present peculiar problems for particular types of source material. The Scottish Corpus of Texts and Speech (SCOTS) is the first large‐scale corpus project specifically dedicated to the languages of Scotland, and therefore it faces many unanswered questions, which will have a direct impact on the corpus design. The first phase of the project will focus on the language varieties Scots and Scottish English, varieties that are themselves notoriously difficult to define. This paper outlines the complexities of the Scottish linguistic situation, before going on to examine the problematic issue of how to construct a well‐balanced and representative corpus in what is largely uncharted territory. It argues that a well‐formed corpus cannot be constructed in a linguistic vacuum, and that familiarity with the overall language population is essential before effective corpus sampling techniques, methodologies, and categorization schema can be devised. It also offers some preliminary methodologies that will be adopted by SCOTS.

Url:
DOI: 10.1093/llc/18.1.23


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">The Scottish Corpus of Texts and Speech: Problems of Corpus Design</title>
<author wicri:is="90%">
<name sortKey="Douglas, Fiona M" sort="Douglas, Fiona M" uniqKey="Douglas F" first="Fiona M." last="Douglas">Fiona M. Douglas</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:8A136A881E2F17F4A79E29FDB1EFC0B4490DF24E</idno>
<date when="2003" year="2003">2003</date>
<idno type="doi">10.1093/llc/18.1.23</idno>
<idno type="url">https://api.istex.fr/document/8A136A881E2F17F4A79E29FDB1EFC0B4490DF24E/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000464</idno>
<idno type="wicri:Area/Istex/Curation">000464</idno>
<idno type="wicri:Area/Istex/Checkpoint">000191</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">000191</idno>
<idno type="wicri:doubleKey">0268-1145:2003:Douglas F:the:scottish:corpus</idno>
<idno type="wicri:Area/Main/Merge">000255</idno>
<idno type="wicri:Area/Main/Curation">000234</idno>
<idno type="wicri:Area/Main/Exploration">000234</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">The Scottish Corpus of Texts and Speech: Problems of Corpus Design</title>
<author wicri:is="90%">
<name sortKey="Douglas, Fiona M" sort="Douglas, Fiona M" uniqKey="Douglas F" first="Fiona M." last="Douglas">Fiona M. Douglas</name>
<affiliation wicri:level="4">
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>University of Glasgow, Glasgow</wicri:regionArea>
<orgName type="university">Université de Glasgow</orgName>
<placeName>
<settlement type="city">Glasgow</settlement>
<region type="country">Écosse</region>
</placeName>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Literary and Linguistic Computing</title>
<title level="j" type="abbrev">Lit Linguist Computing</title>
<idno type="ISSN">0268-1145</idno>
<idno type="eISSN">1477-4615</idno>
<imprint>
<publisher>Oxford University Press</publisher>
<date type="published" when="2003-04">2003-04</date>
<biblScope unit="volume">18</biblScope>
<biblScope unit="issue">1</biblScope>
<biblScope unit="page" from="23">23</biblScope>
<biblScope unit="page" to="37">37</biblScope>
</imprint>
<idno type="ISSN">0268-1145</idno>
</series>
<idno type="istex">8A136A881E2F17F4A79E29FDB1EFC0B4490DF24E</idno>
<idno type="DOI">10.1093/llc/18.1.23</idno>
<idno type="local">180023</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0268-1145</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">In recent years, the use of large corpora has revolutionized the way we study language. There are now numerous well‐established corpus projects, which have set the standard for future corpus‐based research. As more and more corpora are developed and technology continues to offer greater and greater scope, the emphasis has shifted from corpus size to establishing norms of good practice. There is also an increasingly critical appreciation of the crucial role played by corpus design. Corpus design can, however, present peculiar problems for particular types of source material. The Scottish Corpus of Texts and Speech (SCOTS) is the first large‐scale corpus project specifically dedicated to the languages of Scotland, and therefore it faces many unanswered questions, which will have a direct impact on the corpus design. The first phase of the project will focus on the language varieties Scots and Scottish English, varieties that are themselves notoriously difficult to define. This paper outlines the complexities of the Scottish linguistic situation, before going on to examine the problematic issue of how to construct a well‐balanced and representative corpus in what is largely uncharted territory. It argues that a well‐formed corpus cannot be constructed in a linguistic vacuum, and that familiarity with the overall language population is essential before effective corpus sampling techniques, methodologies, and categorization schema can be devised. It also offers some preliminary methodologies that will be adopted by SCOTS.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Royaume-Uni</li>
</country>
<region>
<li>Écosse</li>
</region>
<settlement>
<li>Glasgow</li>
</settlement>
<orgName>
<li>Université de Glasgow</li>
</orgName>
</list>
<tree>
<country name="Royaume-Uni">
<region name="Écosse">
<name sortKey="Douglas, Fiona M" sort="Douglas, Fiona M" uniqKey="Douglas F" first="Fiona M." last="Douglas">Fiona M. Douglas</name>
</region>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Ticri/explor/TeiVM2/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000234 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000234 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Ticri
   |area=    TeiVM2
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:8A136A881E2F17F4A79E29FDB1EFC0B4490DF24E
   |texte=   The Scottish Corpus of Texts and Speech: Problems of Corpus Design
}}

Wicri

This area was generated with Dilib version V0.6.31.
Data generation: Mon Oct 30 21:59:18 2017. Site generation: Sun Feb 11 23:16:06 2024